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ABSTRACT 

The Chandra COSMOS Survey (C-COSMOS) is a large, 1.8 Ms, Chandra program, that covers the 
central contiguous ~ 0.92 deg 2 of the COSMOS field. C-COSMOS is the result of a complex tiling, 
with every position being observed in up to six overlapping pointings (four overlapping pointings in 
most of the central ~ 0.45 deg 2 area with the best exposure, and two overlapping pointings in most of 
the surrounding area, covering an additional ~ 0.47 deg 2 ). Therefore, the full exploitation of the C- 
COSMOS data requires a dedicated and accurate analysis focused on three main issues: 1) maximizing 
the sensitivity when the PSF changes strongly among different observations of the same source (from 
~ 1 arcsec up to ~ 10 arcsec half power radius); 2) resolving close pairs; and 3) obtaining the best 
source localization and count rate. We present here our treatment of four key analysis items: source 
detection, localization, photometry, and survey sensitivity. Our final procedure consists of a two step 
procedure: (1) a wavelet detection algorithm, to find source candidates, (2) a maximum likelihood 
Point Spread Function fitting algorithm to evaluate the source count rates and the probability that 
each source candidate is a fluctuation of the background. We discuss the main characteristics of this 
procedure, that was the result of detailed comparisons between different detection algorithms and 
photometry tools, calibrated with extensive and dedicated simulations. 
Subject headings: X-rays; Surveys 



1. INTRODUCTION 

It is well known that X-ray surveys are an extremely 
efficient tool to select Active Galactic Nuclei ( AGN) . For 
example in the XMM-Newton COSMOS survey, at the 
0.5-2 keV limiting flux of 7-10~ 16 erg s" 1 cm" 2 , the AGN 
surface density is ~1000 deg -2 (Hasinger et al. 2007, 
Cappelluti et al. 2007), a factor 2-4 greater than the 
AGN surface density in the most recent deep optical sur- 
veys, 250 deg" 2 in the COMBO-17 ( Wolf et al. 2003) 
and 470 deg" 2 in VVDS Survey (Gavignaud et al. 2006). 
There are four main causes for the higher efficiency of X- 
ray surveys in finding AGN: 1) X-rays directly trace the 
super massive black hole (SMBH) accretion, while AGN 
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classification trough optical line spectroscopy may suffer 
of uncomplctcncss and/or misidentifications; 2) AGN are 
the dominant X-ray population. In fact most (~ 80%) of 
the X-ray sources AGN in deep and shallow surveys turn 
out to be AGN, unlike at optical wavelengths. 3) 0.5- 
10 keV X-rays (the typical Chandra and XMM-Newton 
enery band) are capable to penetrate column densities 
up to ~10 24 cm -2 , allowing the selection of moderately 
obscured AGN; 4) low luminosity AGN are difficult to 
select in optical surveys, because their light is diluted in 
the host galaxy emission. 

So far Chandra and XMM-Newton have performed sev- 
eral deep, pencil beam, and shallower but wider surveys. 
Fig. [T] compares the flux limit and area coverage of 
the main Chandra and XMM-Newton surveys. This fig- 
ure shows that XMM-Newton COSMOS and Chandra- 
COSMOS (C-COSMOS, Elvis et al. 2009, Paper I here- 
after) surveys are the deepest surveys on large contiguous 
area. The coverage of larger areas at similar flux lim- 
its is today achieved only by serendipitous surveys using 
mostly not contiguous areas (see e.g., CHAMP, Kim et 
al. 2004a, 2004b, Green et al. 2004). 

The Cosmic evolution survey (COSMOS, Scoville et 
al. 2007) is aimed at studying the interplay between 
the Large Scale Structure (LSS) in the Universe and the 
formation of galaxies, dark matter, and AGN. The COS- 
MOS field is located near the equator (10h,+02degrees), 
covers ~ 2 square degrees as originally defined by the 
HST/ACS imaging (Koekemoer et al. 2007), with sub- 
sequent deep and extended multi-wavelength coverage 
overlapping this area. The size of COSMOS was chosen 
to sample LSS up to a linear size of about 50 Mpc h" 1 at 
z ~ 1-2, where AGN and star formation in galaxies are 
expected to peak. To study the role of AGN in galaxy 
evolution the X-ray data are fundamental. Therefore, 
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Fig. 1. — The 0.5-2 keV flux range vs. the area coverage for var- 
ious surveys. The black solid lines represent few serendipitous sur- 
veys: Helllas2XMM (Baldi et al. 2002, symbol A), CHAMP (Kim 
et al. 2004a, 2004b, Green et al. 2004, symbol B), SEXSI (Harrison 
et al. 2003, symbol C), XMM-BSS (Delia Ceca et al. 2004, symbol 
D), AXIS (Carrera et al. 2007, symbol E); the red dotted lines rep- 
resent few deep pencil beam surveys: CDFN (Brandt et al. 2001, 
Alexander et al. 2003, symbol F), CDFS (Giacconi et al. 2001, Luo 
et al. 2008, symbol G), XMM-Newton Lockman Hole (Worsley et 
al. 2004, Brunner et al. 2008, symbol H); the blue dotted lines 
represent few wide shallow contiguous surveys: C-COSMOS (Elvis 
et al. 2009, symbol I), XMM-COSMOS (Hasinger et al. 2007, 
Cappelluti et al. 2007, 2009, symbol L), ELAIS-S1 (Puccetti et 
al. 2006, symbol M), E-CDF-S (Lehmer et al. 2005, symbol N), 
AEGIS-X (Laird et al. 2009, symbol O), SXDS (Ueda et al. 2008, 
symbol P). The black solid triangle represent the ROSAT all sky 
survey (RASS, Voges et al. 1999). 



the central square degree of the COSMOS field has been 
the target of a Chandra ACIS-I, 1.8 Msec Very Large 
Program: the C/iandra-COSMOS survey. 

The C-COSMOS survey has a rather uniform effective 
exposure of ~ 160 ksec over a large area (~ 0.45 deg 2 ), 
thus reaching ~ 3.5 times fainter fluxes than XMM- 
COSMOS in both 0.5-2 keV band and 2-7 keV band. 
This flux limit is below the threshold where starburst 
galaxies become common in X-rays. The sharp Chandra 
Point Spread Function (PSF) allows nearly unambigu- 
ous identification of optical counterparts (Civano et al. 
2009, hereafter Paper III). Chandra secures the identifi- 
cations of X-ray sources down to faint optical magnitude 
(i.e., I ~ 26), with only ~ 2% ambiguous identifications, 
significantly better than the ~ 20% ambiguous identifi- 
cations in XMM-Newton (Brusa et al. 2007). 

The C-COSMOS survey has a complex tiling (see Fig. 
[2]) in comparison to other X-ray surveys, in which the 
overlapping areas of the single pointings are small and 
with similar PSFs (see e.g., the Extended Groth-Streep, 
AEGIS-X, Laird et al. 2009), or all the pointings are co- 
assial and nearly totally overlapping (see e.g., CDFS, Gi- 
acconi et al. 2001, Luo et al. 2008). In the C-COSMOS 
tiling, the pointings are strongly overlapping and not- 
coassial. While this ensures a very uniform sensitivity 
over most of the field, each source is observed with up 
to six different PSFs, requiring the development of an 



analysis procedure for data observed with this mixture 
of PSF. The procedures presented in this paper are aimed 
at optimizing (1) source detection, (2) localization, (3) 
photometry, and (4) survey sensitivity. We have made 
detailed comparisons between different detection algo- 
rithms and photometry tools, testing them extensively 
on simulated data. We furthermore validate our results 
by detailed inspections of each single source candidate. 
Our final analysis consists of a two main steps: 

1 a wavelet detection algorithm, PWDetect (Dami- 
ani et al. 1997) is first used to find source candi- 
dates. This algorithm is optimized to cleanly sep- 
arate nearby sources, to detect point-like sources 
on top of extended emission and to give the most 
accurate positions. 

2 A maximum likelihood PSF fitting algorithm is 
then used to evaluate the source count rates and 
the probability that each source candidate is not 
a fluctuation of the background. We used the 
emldetect algorithm (Cappelluti et al. 2007 and 
references therein). emldetect works simultane- 
ously with multiple overlapping pointings using 
PSFs appropriate to each one. This fitting method 
ensures accurate evaluation of the survey com- 
pleteness and contamination, efficient deblending 
and good photometry for close pairs, which may 
be partly blended even at the Chandra resolution. 

As a third step, we also performed 
aperture photometry for each candidate X-ray source 
using 50%, 90%, and 95% encircled count fractions, 
using the PSFs appropriate to each observation. The 
aperture photometry is also used to check the results. 
Aperture photometry is preferable in all cases where the 
systematic error introduced by PSF fitting are larger 
than the statistical errors, i.e., for bright sources (count 
rates > 1 counts/ksec). 

The survey sensitivity is limited by both the net (i.e., 
including vignetting) exposure time, and by the ac- 
tual PSF with which a given region of the area is ob- 
served. The latter issue is particularly relevant for the 
C-COSMOS tiling. We have developed an algorithm 
that evaluates the survey sensitivity at each position 
on C-COSMOS using a parameterization of the Chan- 
dra ACIS-I PSF and taking into account the mixture of 
PSFs at each position. The resulting sensitivity maps 
have been compared and validated with extensive simu- 
lations. 

The paper is organized as following: in Sect. 2 we 
briefly present the C-COSMOS observations and data 
reduction; we describe the simulations in Sect. 3; how 
they were used to select the most efficient detection algo- 
rithm and the final source characterization procedure is 
described in Sect. 4; the completeness and reliability are 
shown in Sect. 5; in Sect. 6 we apply this procedure to 
the observed data; in Sect. 7 we present the calculation 
of survey sensitivity, the sky-coverage, and X-ray number 
counts using the simulated data. Finally, in Sect. 8 we 
compare C-COSMOS to a similar Chandra survey, i.e., 
AEGIS-X, and in Sect. 9 we give our conclusion. 

2. OBSERVATIONS AND DATA REDUCTION 



C-COSMOS data analysis 



3 




Fig. 2.— The final tiling of the C-COSMOS field, with a color 
scale showing the number of the ACIST overlapping pointings, as 
indicated in the color bar at the bottom of the figure. 

We give here a brief description of the observations and 
data reduction. The full details are given in Paper I. The 
C-COSMOS field covers a contiguous area of ~ 0.92 deg 2 , 
centered at I0 h 00™ 18.91 s +02° 10' 33.48", near the 
center of the full COSMOS field. The survey is made up 
of 36 different heavily overlapping ACIS-I pointings, each 
with a mean exposure of ~ 50 ksec, for a total exposure 
of 1.8 Msec. Twelve of the 36 pointings were scheduled 
as two or more separate observations, with very similar 
roll-angles, thus resulting in 49 observations in total. Fig. 
[2] shows the number of ACIS-I pointings per pixel. Note 
that the central ~ 0.45 deg 2 area is covered by four to 
six overlapping pointings, while most of the outer ~ 0.47 
deg 2 area is covered by one to two overlapping pointings. 
As an example, Fig. [3] shows the image of the same 
source observed in four overlapping fields at different off- 
axis angles. 

The 49 observations were processed using the standard 
CIAO 3.4 software tools 13 (Fruscione et al. 2006). Event 
files were cleaned of bad pixels, soft proton flares and 
cosmic-ray afterglows, and were brought to a common 
reference frame by matching the positions of bright X- 
ray sources with the optical position of bright (18<I<23), 
point-like optical counterparts. The systematic shifts be- 
tween the X-ray and optical positions are A RA=0.04" 
and A DEC=0.25" (see Paper I). Observations with the 
same aim points and consistent roll-angles were merged 
together, producing 36 event files, one for each indepen- 
dent pointing. 

The flux limits for source detection are influenced by 
three main factors: (1) net exposure time, (2) back- 
ground per pixel, and (3) size of the source extraction 
region, which in turn depends on the size of the PSF 

13 http://cxc.harvard.edu/ciao/ 



at the given position. The Chandra ACIS-I on-axis PSF 
has a spatial resolution of 0.5" FWHM, equivalent to 
< 4-4.5 kpc at any redshift, and permits observations 
of up to ~ Msec to be photon limited. The adopted 
tiling produces a rather homogeneous exposure time over 
the C-COSMOS field (i.e., ±12% in the central - 0.45 
deg 2 area) and a uniform background. In the vignetting- 
corrected exposure time we clearly distinguish two main 
peaks at 80 and 160 ksec (see Fig. 7 of Paper I). Fig. 0] 
shows the fraction of the C-COSMOS area with a given 
background per square arcsec in the three analyzed en- 
ergy bands: 0.5-7 keV (full band, F), 0.5-2 keV (soft 
band, S), and 2-7 keV (hard band, H). We see two main 
peaks at 0.07 and 0.14 counts/arcsec 2 in the F band and 
at 0.02 and 0.04 in the S band, corresponding to the 
two main peaks of the exposure time distribution. These 
peaks correspond to a level of ~ 2 and ~ 4 counts in 
the F band, and ~ 0.6 and ~ 1.2 counts in the S band 
over an area of 3 arcsec radius, a typical source detection 
region for off-axis angles less than 5-6 arcmin. Even the 
area with the largest exposure time has therefore rela- 
tively low background for point source detection; this is 
important for the detection of the faintest sources. 

3. GENERATION OF SIMULATED DATA 

Extensive simulations were performed in order to test 
various source detection schemes. The simulations were 
used (1) to test the reliability of the source position re- 
construction, (2) to verify the count rate reconstruction, 
and (3) to assess and validate the level of significance 
of each detected source at each given detection thresh- 
old and thus to evaluate the level of completeness of the 
source list as a function of flux. 

3.1. Creating the simulated input source catalog 

In order to include realistic source clustering into the 
simulated data, we sampled particles from a COSMOS 
Mock galaxy catalog (V3.0) derived by Kitzbichler and 
White (2008). They made use of the Millennium Simula- 
tion (Springel et al. 2005), a very large simulation which 
follows the hierarchical growth of dark matter structures 
from redshift z=127 to the present. The simulation as- 
sumes the concordance ACDM cosmology and follows 
the trajectories of 2160 3 (~ 10 10 ) particles in a periodic 
box 500 h _1 Mpc on a side, using a special reduced- 
memory version of the GAD GET- 2 code (Springel et al. 
2001; Springel 2005). The formation and evolution of the 
galaxy population is simulated by using a semi-analytical 
model (Croton et al. 2006, De Lucia & Blaizot, 2007). 
We randomly selected 10000 mock galaxies per square 
degree in the ad hoc redshift range 0.4< z <0.9 and i 
band magnitude range 17< i <26. The selected ran- 
dom sources in this redshift-magnitude range show the 
same angular correlation function (ACF) as the S band 
XMM-COSMOS sources (Miyaji et al. 2007), as shown 
in Fig. not taking into account that the clustering 
strength could depend on the survey flux limit (Plionis 
et al. 2008). The agreement between the ACF of the 
random sample and the XMM-COSMOS sample is good 
down to the 0.5 arcminute scale. Below 0.5 arcmin, the 
uncertainties in the S-band of the XMM-COSMOS ACF 
and the other X-ray ACF from literature (see e.g., the 
Chandra Deep Field South, D'Elia et al. 2004) are too 
large to allow them to be sensibly compared with the one 
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Fig. 3. — The image of the same source (i.e., source-id 50 in the C-COSMOS catalog presented in Paper I) observed in four overlapping 
fields at different off-axis angles. The contours are drawn at 90%, 50%, 25%, and 10% of the peak counts. The red circles centered on the 
position of the source have a radius of 2 arcsec. 
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Fig. 4. — Area fraction for a given background per square 
arcsec in the F band (solid blue histogram), S band (dashed green 
histogram), and H band (empty red histogram). 



we derive. Each simulated galaxy was then assigned an 
S band flux, randomly drawn from the number weighted 
logiV - \ogS relation of the AGN population synthesis 
model by Gilli et al. (2007). The corresponding mini- 
mum S band flux for the input particles was ~ 3 • 10~ 18 
erg s _1 cm~ 2 , which is a factor 100 below the detection 
limit of C-COSMOS. Hence background fluctuations due 
to unresolved faint sources are included in the simula- 
tions. 

The S band flux of each source was then converted into 
an F band flux assuming a power— law spectrum with an 
energy index =0.4 14 . The simulated sources cover a 
3 deg 2 sky area, which is enough to completely enclose 
the COSMOS field. 



3.2. Creating the simulated X-ray event files 

Using the MARX simulator 15 (version 4.2.1), we simu- 
lated a set of 49 Chandra ACIS-I pointings with the same 
exposure times, aim points, and roll-angles as the real C- 
COSMOS pointings (see Paper I). The simulated source 
list was fed into each simulated pointing and net source 
counts F recorded. This procedure returns 49 Chandra 
events files containing only source photons. 

14 f B ocE- r , with r = a E + 1 

15 http://space.mit.edu/CXC/MARX/ 
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Fig. 5.— The ACF of the sources selected from the COSMOS 
Mock catalog for input to the C-COSMOS S band simulation (red 
squares) compared with that of XMM-COSMOS (S band, blue 
solid dots). 



To include a background appropriate to each point- 
ing we used the CXC compilation of blank sky fields 16 . 
These blank fields lie at high Galactic latitude, away from 
soft bright features such as the North Polar Spur, and 
have a median exposure of ~ 70 ks. Point-like and ex- 
tended sources down to fluxes that would be detectable 
in each exposure have been excluded, and the individ- 
ual exposures have been stacked into different blank sky 
files. We chose the stacked blank sky file appropriate for 
ACIS-I data at the epoch of our observations 17 , filtered 
to keep only photons detected in VFAINT mode observa- 
tions. This blank sky field has a total effective exposure 
of ~ 1.5 Msec. 

We then extracted 49 background event files by ran- 
domly resampling the events out of the blank sky file 
scaling by the exposure time of each observation. Faint 
simulated sources with only a few counts would not be de- 
tected and increase the background level by ~ 5% at the 
depth of the blank sky observations. Since these faint, 
unresolved sources are already included in the blank sky 
files, in order to avoid counting them twice, we removed 
5% of the photons in each background event file. 

The background files were then reprojected to the co- 
ordinates of the real pointings by using the real aspect 
solution files, and then combined with the corresponding 
source event files. The final result is a set of 49 simulated 
ACIS-I fields that closely mirror the actual 49 observa- 
tions. 

4. CHOOSING THE C-COSMOS SOURCE 
DETECTION AND CHARACTERIZATION 
PROCEDURE 

In order to fully exploit the large and deep C-COSMOS 
coverage a particular care had to be devoted to maximize 

16 http:/ /cxc. harvard.edu/contrib/maxim/acisbg/ 

17 http:/ /cxc. harvard.edu/contrib/maxim/acisbg/data/ 
acisi_D_01236.bg_evt_010205.fits 



areal coverage and produce uniform depth; C-COSMOS 
used a complex tiling, with four overlapping pointings in 
most of the central ~ 0.45 deg 2 area with the best expo- 
sure, and two overlapping fields in most of the surround- 
ing area, covering additional ~ 0.47 deg 2 (see Fig. [2J. 
As a result, each source is observed at different off- axis 
angles, Qi (i.e., the distance of the source position from 
the aim point in all overlapping fields), and thus with 
different PSFs. For some sources in the central area the 
number of different Qi is as high as six. This mixture 
of PSFs requires addressing three main issues: (1) max- 
imizing the sensitivity when the PSF changes so widely 
between different observations of the same source (from 
~ 1 arcsec to ~ 10 arcsec half power radius); (2) max- 
imizing the spatial resolution aimed to obtain the best 
source localization and the effective deblendig; ^obtain- 
ing accurate photometry, even in cases of partly blended 
sources. To solve these issues a dedicate analysis pro- 
cedure was developed, and the simulations were used to 
determine and validate it. 

We tested sliding cell and wavelet algorithms to find 
and locate source candidates, and both PSF fitting and 
aperture photometry. In particular, we compared the 
results obtained using the SAS eboxdetect 18 and emlde- 
tect 19 tasks, used for the XMM-COSMOS survey (Cap- 
pelluti et al. 2009), with those obtained using the PWDe- 
tect code (Damiani et al. 1997) and CI AO wavdetect 20 
(Freeman et al. 2002). We compared PWDe.te.ct and 
CIAO wavdetect on a data subset including 8 ACIS-I 
fields and found consistent results. We adopt the PWDe- 
tect as the main wavelet algorithm because of its much 
faster processing time (i.e., factor of 40-^50) with respect 
to CIAO wavdetect. 

4.1. PWDetect 

The PWDetect code (Damiani et al. 1997) was orig- 
inally developed for the analysis of ROSAT data, and 
was then adapted for the analysis of Chandra and XMM- 
Newton data. This method is particularly well suited for 
cases in which the PSF is varying across the image, as for 
Chandra images, since PWDetect is based on the wavelet 
transform (WT) of the X-ray image, i.e., a convolution of 
the image with a "generating wavelet" kernel, which de- 
pends on position and length scale, that is a free parame- 
ter. For the Chandra data, the length scale is varied from 
0.35" to 16" in steps of This choice spans the range 
from the smallest to the largest (for large Qi) Chandra 
PSFs. Both radial and azimuthal PSF variations are ac- 
counted for by PWDetect, which first assumes a gaussian 
PSF and then corrects by a PSF shape factor, calibrated 
on both radial and azimuthal coordinates. PWDetect was 
run on each of the 36 event files with a low significance 
level of ~ 10~ 3 , to have entries with just 5 source counts 
(i.e., to pick up most of the input sources). The catalogs 
of source candidates from overlapping fields were then 
merged. The off-axis angle 6i is recorded and the source 
position measured at the smallest Qi (i.e., with the best 
PSF) is adopted as the reported source position. If a 
candidate is not detected in one or more of the overlap- 
ping fields, the count rate is computed at the position 

18 http:/ /xmm.esac.esa.int/sas/8.0.0/eboxdetect/ 

19 http:/ /xmm.esac.esa.int/sas/8.0.0/emldetect/ 

20 http:/ /asc. harvard.edu/ciao/ahelp/wavdetect. html 
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of the source candidate and within a circle of radius R^, 
corresponding to 90% of the encircled count fraction of 
the PSF (/ ps/ 21 =90%) at 9 U as calibrated by the CXC 22 . 
Finally, a mean count rate, that is weighted by the count 
rate errors, is associated at each source. Analysis of the 
simulated data showed that all candidates with a wavelet 
size smaller than the PSF size and less than 5 counts are 
spurious detections. These were then excluded from the 
candidate catalogs. 

4.2. EBOXDETECT and EMLDETECT 

Both eboxdetect and emldetect are part of the XMM- 
Newton SAS package and are based on programs origi- 
nally developed for the detection in ROSAT images (see 
e.g., Voges et al. 1999). eboxdetect is a standard sliding 
cell detection tool, which is run on each of the 49 sin- 
gle observations, eboxdetect produces a list of candidate 
sources down to a selected low significance level. The list 
of source candidates is then passed to the emldetect task. 
emldetect performs a simultaneous maximum likelihood 
PSF fitting for each candidate to all the images at each 
position (see e.g. Cappelluti et al. 2007 for more details 
on eboxdetect and emldetect). eboxdetect was run setting 
a low significance level (DET_ML=3 or P r andom=0.05), 
to provide a list of source candidates to emldetect, that 
recognizes all possible significant sources. 

emldetect has been adapted to run on Chandra data by 
replacing the XMM-Newton PSF library with the Chan- 
dra PSF library (see note 22), and to work with many 
different PSFs, simultaneously. The counts at each po- 
sition were fitted using a model obtained by convolv- 
ing the PSF at that position with a (3 model (Cruddace, 
Hasinger and Schmitt 1988). The program interpolates 
over the calibration library of Chandra PSFs to find the 
most appropriate PSF at the position of each source in 
each observation. The more crowded is the field, the 
more candidates are fitted simultaneously, emldetect can 
provide both source positions and source count rates, or 
only source count rates using fixed source positions. We 
ran it fitting for both source positions and count rates. 
The best fit maximum likelihood, DETJV1L, is related 
to the Poisson probability that a source candidate is a 
random fluctuation of the background (P random)'- 



DET_ML = -ln(P, 



random) (1) 

Sources with low values of DET_ML, and correspond- 
ingly high values of P random, are then likely to be back- 
ground fluctuations. 

4.3. Tests on simulations 

We ran both detection algorithms on the simulated 
data. Catalogs of candidates were produced with both 
eboxdetect and PWDetect. These lists were visually in- 
spected to identify obviously spurious detections on the 
wings of the Chandra PSF around bright sources, and 
near the edges of the ACIS-I chips. For both detection 
algorithms, the number of these clearly spurious detec- 
tions is rather small in all three bands (< 1 — 2%). These 
entries were deleted and the 'cleaned' lists used as in- 
put for the emldetect tool. The emldetect output catalog 

21 fpsf indicates a fraction of the source counts distributed in a 
circular area, following the PSF shape. 

22 http://cxc.harvard.edu/caldb/ 



TABLE 1 

Comparison between eboxdetect+ emldetect and PWDetect 



Parameter 
1 



eboxdetect+ emldetect PWDetect 
2 3 



Comparison on source position: 



<A R.A.> a 
<A Dec> a 
A R.A. RMS b 
A Dec. RMS b 



0.17" ±0.16" 
-0.18" ±0.15 

0.32" 

0.35" 



0.02" ±0.15" 
0.003" ±0.15" 

0.31" 

0.34" 



Comparison on completeness of close pairs: 



% of missed pairs c 



75% 



Comparison on source photometry: 



FJF) d 

, FAS) . d 
< fVF[Sj > 
. F X (H) 



0.97±0.11 
1.00±0.12 
1.05±0.16 



0.86±0.12 
0.94±0.14 
0.88±0.17 



Note. — Column (1) shows the parameters used to test the 
accuracy of source localization, the completeness un the recovery 
of close pairs, and the flux reconstruction of the two detection 
algorithm, which we used. Column (2) and (3) show the results 
for the eboxdetect+ emldetect and the PWDetect algorithm, re- 
spectively. 

a The median and interquartile of the shifts between the R.A. or 
Dec. of the input sources and the R.A. or Dec of the detected 
sources, sec also Fig. [6] 

b The RMS of the R.A. or Dec. shifts between input and detected 
positions, sec also Fig. [6] 

c Percentage of the pairs with a separation smaller than 4 arcsec, 
that are missed in comparison to PWDetect, see also Fig. fTJ 
^ The median and interquartile of the ratio between the output 
detected and input simulated count rates in the F, S, and H band, 
see also Fig. l8l 



was then cut at a conservative value of DET_ML=12 
[Prandom < 6 x 10~ 6 ), to ensure that the number of 
spurious detections in this catalog is practically zero, so 
that the results are not contaminated by spurious asso- 
ciations. 

Matched catalogs between the input simulated cata- 
log and the emldetect and PWDetect output catalogs 
were produced using two methods: (1) a conservative 
approach, using a fixed matching radius of 0.5 arcsec. 
This produces matched catalogs which probably miss a 
fraction of real associations, but are virtually free from 
spurious associations. (2) A maximum likelihood algo- 
rithm, to find the most probable association between an 
input source and an output detected source. We used 
the catalogs produced using the first method to study 
the accuracy of source localization and flux reconstruc- 
tion, while we used the catalogs produced by the second 
method to study the completeness and reliability of the 
detection algorithms (see Sect. 5). 

Table [1] summarizes the comparison of the results of 
the application of eboxdetect+ emldetect and PWDetect 
on simulated data. 

We first compared the best-fit source coordinates pro- 
vided by emldetect and PWDetect with the input source 
positions (see Tab. [1}. The RMS variations and the in- 
terquartile of the shifts are similar for the two detection 
algorithms; however, we find a small systematic median 
shift between input and detected R.A. and Dec. (see also 
Fig. [6] ) using emldetect. We conclude that PWDetect 



C-COSMOS data analysis 



7 



provides positions of higher quality. 

As a second step, we focussed on the ability of the de- 
tection algorithms to separate close pairs of sources in 
Chandra data, comparing the numbers of pairs found by 
emldetect and PWDetect (see Fig. and Tab. [TJ. The 
two algorithms are equivalent for large ( >4") separa- 
tions, but there is a deficiency in the number of pairs 
recovered by emldetect at small (<4") separations. We 
verified that all the ~ 75% of the pairs with a separation 
smaller that 4 arcsec missed by emldetect are in the input 
source list, and not spuriously created by the splitting of 
a single source. Analysis of the eboxdetect candidate list 
and emldetect final list shows that the majority (>70%) 
of these pairs are missed in the emldetect step, where the 
program finds a best fit including one significant source 
only, while the second falls below the detection thresh- 
old. We conclude that PWDetect is more efficient than 
emldetect at resolving close pairs with separations <4" 
and greater than ~ 1.8". 

Finally we compared the emldetect and PWDetect best- 
fit count rates with the input count rates in the F, S, 
and H band (see Fig. Hand Tab. [T). The PWDetect re- 
constructed count rates were systematically smaller than 
the input count rates by 10-20%. A similar problem was 
found by Puccetti et al. (2006) using a similar detection 
algorithm on XMM-Newton data, emldetect reconstructs 
much better the count rates in all the bands. 

The accuracy of the count rate reconstruction of the 
emldetect algorithm is also good at all count rates, with- 
out any large systematic shifts, both at low count rates 
and at high count rates (see left panel of Fig. |5|). The 
right panel of Fig. [9] shows the difference between the 
emldetect count rate and the input simulated count rate 
divided by the emldetect error on the count rate as a 
function of the emldetect count rate. We see that the 
distribution is approximately centered around zero for 
count rates smaller than ~ 0.5 counts/ksec, but becomes 
positive for larger count rates. This suggests that at high 
count rates, there is a not negligible systematic error in 
the emldetect count rate determination, due to the un- 
certainties in the PSF model becoming comparable to, or 
higher than, the statistical error. For this reason we also 
performed aperture photometry (see Sect. 6.4), which 
should be free from this systematic error. 

4.3.1. Error on the positions 

The source positional error is proportional to the PSF 
at the position of the source, and inversely proportional 
to the square root of the source number counts. We 
evaluated the errors on the positions by dividing the 
PSF ra dius by (1) the square root of the total source plus 
background counts (T, Pos Error = PSF rac n us /y/T) and 
(2) the square root of the net, background subtracted, 

SOUrCe COUntS (C s , PoSError = PSFradius/VC s ). We 

used different PSF ra di US , from 50% to 90% of the f ps / at 
the position of each source in the field where the source 
is detected at the smallest Si (i.e., with the best PSF). 

These errors were then compared with the deviations 
between the X-ray positions and input positions in the 
simulations. Method (2) gave the best match using a 
PSF ra( 2iu S corresponding to the 50% f ps f at the 0i of 
each source, and the counts included in a circular region 
with the same radius. We used the f ps / - Qi calibration 



provided by CXC (see note 19). Larger PSF ra( jii pro- 
vided implausibly large position errors for bright sources. 
Including background counts (method 1) produces too 
small errors for faint sources, where the background is 
not negligible. For ~ 60 sources with more than ~ 120 
counts, the errors on RA and Dec are formally smaller 
than 0.07 arcsec (i.e., errors on source position smaller 
than 0.1 arcsec). In these cases the error on source po- 
sition was conservatively set to 0.1 arcsec to account for 
possible small systematic errors in the astrometric correc- 
tions (see Sect. 2 and 4.3). Fig. [TOlshows the distribution 
of the ratio between the deviation between the PWDetect 
positions and input positions and the X-ray error on the 
position evaluated as in method (2). The distributions in 
the three detection bands are similar and peak at a value 
of ~ 0.7-0.8. These distributions are compared with the 
expectation based on Gaussian statistics which peaks at 
unity. This comparison shows that the assumed errors 
on the positions, although very small, are, on average, 
somewhat larger than the deviation between input po- 
sitions and detected positions. However, to account for 
small systematic errors in the astrometric corrections, 
which are not included in the input positions while they 
certainly affect the observed data, we use in the follow- 
ing the conservative errors on the positions computed as 
described above. 

4.4. The final C-COSMOS source detection and 
characterization procedure 

In summary, the comparison of the two methods 
PWDetect and eboxdetect+ emldetect on the simulated C- 
COSMOS field shows that PWDetect is superior in sep- 
arating closely spaced sources and in localizing sources, 
and relatively poor at photometry. Conversely, emldetect 
is poor at separating closely spaced sources, while it is 
good at estimating source reliability, completeness, and 
photometry. These results suggested the following source 
detection and characterization procedure: 

1- PWDetect is run first with a low threshold to pro- 
duce a catalog of source candidates, with the best 
localization; 

2- this catalog of source candidates is used as input 
for emldetect which performs a PSF fitting to find 
the best fit maximum likelihood source count rate 
and the probability that each source candidate is a 
fluctuation of the background. In emldetect the co- 
ordinates used to fit each source are those provided 
by PWDetect for the most on-axis observation; 

3- aperture photometry is used to get good photome- 
try for bright sources. 

This combined approach allows us to obtain both the 
best possible position determination and reliable pho- 
tometry for all sources. 

5. COMPLETENESS AND RELIABILITY 

The threshold for source detection must be set by 
balancing completeness (the fraction of true sources de- 
tected, i.e., ratio between the number of the detected 
sources and the number of input simulated sources) ver- 
sus reliability (one minus the fraction of spurious sources 
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Fig. 6. — Left panel: shift between the input simulated source positions and the source positions by emldetect using a matching radius of 
2 arcsec (black solid dots). The solid black lines represent the zero shifts. The red circles have a radius of 0.5, 1, and 2 arcsec, respectively. 
Right panel: shift between the input simulated source positions and the source positions by PWDetect using a matching radius of 2 arcsec 
(black solid dots). Symbols as in the left panel. 
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Fig. 8. — The PWDetect (red dashed histogram) and emldetect 
(solid black histogram) best fit count rates over the input count 
rates in the F (left panel), S (center panel), and H ( right panel) 
band. The dotted vertical line corresponds to the exact match 
between the evaluated count rates and the input count rates. Note 
as the emldetect count rates are in good agreement with the input 
count rates. 



Fig. 7. — Top panel: number of pairs in the F band detected 
by PWDetect (black empty histogram) and emldetect (green solid 
histogram), as a function of the separation. Bottom panel: ratio 
between the difference between the number of pairs detected by 
PWDetect and emldetect, and the pairs detected by PWDetect as 
a function of the separation. 

detected, i.e., one minus the ratio between the number 
of spurious sources and the number of input simulated 
sources). Our simulations allow us to choose a thresh- 
old which has a known completeness and reliability. The 
three panels of Fig. [TT1 show the completeness in the F, 
S, and H band as a function of the significance level for 
sources with at least 12 counts (solid lines) and 7 counts 
(dashed lines). The latter value refers to the counts of a 
typical source close to our flux limit, where we expect 



a rather large incompleteness. The former value (12 
counts) ensures significantly higher completeness. Fig. 
ITT1 also shows the reliability as a function of the signifi- 
cance levels for the same two cases. We chose a signifi- 
cance level of 2 • 1(T 5 (or DET_ML=10.8), which repre- 
sents a reasonable compromise between high complete- 
ness and high reliability. Higher significance levels give 
higher completeness but lower reliability. At the chosen 
threshold we have 87.5% and 68% (F band), 98.2% and 
83% (S band), 86% and 67% (H band) completeness for 
sources with at least 12 and 7 counts, respectively. At 
the same significance level and the same counts limits, 
the reliability is ~ 99.7% for the three bands and both 
source count limits. This implies about 5,4, and 3 spuri- 
ous detections with > 7 counts in the F, S, and H bands, 
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Fig. 9. — Left panel: The ratio between the best fit count rates obtained by emldetect and the input count rates as a function of the 
input count rates for the simulations in the F (blue solid circles), S (green open circles), and H (red open squares) band. The solid line is 
the exact match between the best fit count rates and the input count rates. Right panel: the difference between the emldetect count rate 
and the input count rate divided by the emldetect error on the count rate, as a function of the count rate for the sources detected in the S 
band. 




TABLE 2 
Flux limit and Completeness 



12 3 4 

Detected pos. - simulated pos. / error pos. 



Fig. 10. — The distributions of the ratio between the deviation 
between detected positions by PWDetect and input positions and 
the X-ray error on the position for the simulations in three energy 
bands (blue: F band; green: S band; red: H band). The solid curve 
is the expectation based on Gaussian statistics. 



respectively, and 3, 4, and 3 spurious detections with 
> 12 counts in the same bands. 

Fig. [12] shows the completeness for a significance level 
of 2 • 10~ 5 as a function of the flux for the F, S, and H 
bands. Table 2 gives the flux limits corresponding to 4 
completeness fractions in the F, S, and H bands. 

We have also evaluated the completeness of the method 
in the detection of close pairs. Fig. U3l compares the num- 
ber of pairs having one member with at least 7 and 12 



Completeness 


F(0.5-10 keV) 


F(0.5-2 keV) 
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% 


erg cm -2 s — 1 
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90 
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1.1 ■ 10" 15 


7.8 ■ 10" 15 


80 
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6.1 ■ 10" 15 


50 


1.7 ■ 10~ 15 


4.5 ■ 10- 16 


2.9 ■ 10~ 15 


20 


1.1 ■ 10~ 15 


3.3 ■ 10~ 16 


2.0 ■ 10~ 15 



counts in the simulated data with the detected number 
of pairs. The number of pairs in the simulated data have 
been corrected dividing them by the square of the com- 
pleteness expected at their counts thresholds (87.5% for 
the pairs with at least 12 counts and 68% for the pairs 
with at least 7 counts). In fact to correctly compare the 
number of pairs in the simulated data and the detected 
number of pairs, it is necessary to take into account that 
the detected number of pairs is not complete at the cho- 
sen significance level, and moreover that each pair must 
be corrected for the completeness of both sources in pair, 
that is the square of completeness. We see that at dis- 
tances smaller than 5 arcsec, we miss between 50% and 
70% of the pairs with at least 12 counts and between 
70% and 80% of the pairs with more than 7 counts. The 
reason is that it is increasingly difficult to detect a faint 
(7 or 12 counts) source near a bright source, because of 
the wings of the PSF of the latter. Indeed, all pairs re- 
covered have a counts ratio < 3, while about 40% of the 
input pairs have a count ratio > 3, none of which are 
detected in our analysis. 

6. OBSERVED DATA: SOURCE DETECTION AND 
COUNT RATES 

Source detection and characterization were performed 
on the real, observed, event files using the approach de- 
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Fig. 11. — Completeness (solid and dot-short dashed lines, left y axis) and reliability (long dashed and short dashed lines, right y axis) 
as a function of the significance level for the simulations in the F (left panel), S (center panel) and H (right panel) band, for sources with 
at least 12 counts (solid and long dashed lines, respectively) and at least 7 counts (dot-short dashed and short dashed lines, respectively). 
The dotted vertical black lines indicate the chosen significance level of 2 • 10 — 5 . 




Fig. 12. — The crosses represent the completeness as a function 
of the flux at the chosen significance level of 2 ■ 10 -5 , in F band 
(blue crosses), S band (green crosses), and H band (red crosses). 
The dashed lines connect the relative cross points. The solid lines 
represent the sky-coverage calculated as in Sect. 7.2 and normal- 
ized to the maximum sky-coverage. The horizontal black dashed 
and solid lines indicate 5 completeness fractions. 

scribed in Sect. 4.4. The three energy bands, F, S, and H 
were used. The candidate catalogs produced by PWDe- 
tect, used as input for emldetect were cut at a low thresh- 
old of ~ 10~ 3 , corresponding to 5 counts. The number of 
PWDetect source candidates in each of the three bands 
was between 2500 and 3500. These lists were visually 
cleaned to identify obviously spurious detections on the 
wings of the Chandra PSF around bright sources and 
near the edges of the ACIS-I chips, following the same 
procedure adopted for the simulated data (see Sect. 4). 
As for the simulations, the number of clearly spurious 
detections is small in all three bands (< 1 — 2%). 
At the chosen probability (i.e. significance level 2- 10~ 5 



Fig. 13. — Top panel: the number of pairs in the F band detected 
by PWDetect (black solid histogram), compared to the number of 
pairs in the simulations having one member with at least 7 counts 
(dotted histogram) and 12 counts (dashed histogram) as a function 
of the separation. Bottom panel: ratio between the pairs detected 
by PWDetect and the number of pairs in the simulations with 7 
or 12 counts as a function of the separation. In both panels the 
numbers of pairs in the simulations are corrected dividing them by 
the square of the completeness expected at their counts thresholds 
(see text). 

or DET_ML=10.8), the number of spurious detections is 
presumably << 12 in the total catalog (i.e. F, S, and 
H band). The total catalog is obtained by the cross- 
correlation of the three single band (i.e. F, S, and H) 
catalogs, in this way the number of spurious sources in 
the single F, S, and H bands, evaluated by the detailed 
analysis of the simulations (see Sect. 5), are no longer 
indipendent. As a result the number of the total spurious 
sources is less than the sum of the spurious sources in 
each of three single bands. 
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6.1. Source position 

Fig. [JJ] (left panel) shows the positional error, eval- 
uated using the empirical technique described in Sect. 
4.3.1, as a function of the off-axis angle. The notch in 
figure depends on the fact that at a fixed off-axis angle, 
the PSF ra dius is ^constant, while y/CZ is a discrete vari- 
able, since T are integer numbers and B are small. The 
error is typically less than ~ 0.5 arcsec at the smallest 
off-axis angles, Qi <2 arcmin, and then increases to 1-2 
arcsec for 0, >2 arcmin. Most of the scatter at a given 
off-axis angle in this figure is due to the range of count 
rates in the sources. Fig. [JJ] (right panel) shows the po- 
sitional error in the F band as a function of the source 
count rate in 4 off-axis bins. Both figures show that the 
quality of the data is good enough to provide positions 
with sub-arcsec accuracy, except for < 12% of F band 
sources (i.e., 202 sources), and for ~ 13.5% of the en- 
tire source catalog (see Paper I for more details). These 
small positional errors are the key to the high identifi- 
cation rate of the C-COSMOS sources with optical and 
infrared counterparts (Civano et al. 2009, Paper III). 

6.2. Count rates 

Vignetting corrected count rates for each source are 
obtained by dividing the best-fit counts derived from 
emldetect for each band and in each single field by the 
net exposure time, reduced by the vignetting at the po- 
sition of each source, as in the exposure maps 23,24 . The 
exposure maps are computed averaging over an area of 8 
pixels to smooth out CCD gaps and cosmetic defects, and 
are weighted with an absorbed power-law spectral model 
with an energy index cue =0.4 and the Galactic column 
density of the COSMOS field, N ff =2.7-10 20 c m - 2 . 

The errors on count rates at 68% confidence level were 
then computed using the equation: 



Error 



^C s + (1 + a) ■ B 



0.9 • T, 



(2) 



expo 



where C s are the source net counts estimated by emlde- 
tect, corrected to an area including 90% of the PSF (see 
note 19), B are the background counts from the emldetect 
background maps (counts/pixel 2 ) multiplied by a circu- 
lar area of radius corresponding to / ps /=90% and T expo 
is the vignetting corrected exposure time at the position 
of the source from the exposure maps, a is a parame- 
ter which accounts for the fact that the background at 
the source position is not known with infinite precision. 
a = 1 corresponds to the situation of a background area 
equal to the source extraction area, which for Chandra is 
always very small because of the very good PSF; a = 
would correspond to assuming no uncertainty on the es- 
timate of the average value of the background. Unfor- 
tunately emldetect provides neither the B errors, nor the 
information on the size of the region used to measure the 
background counts. Because of the way emldetect esti- 
mates the background counts, i.e. by a fit, using a sophis- 
ticated background modeling (Cappelluti et al. 2007), we 
are in an intermediate situation between the two extreme 
cases a=0 and a=l. For this reason, we chose to adopt 
a=0.5. This ensures that we are not under-estimating 

23 http://hea-www.harvard.edu/~clvis/CCOSMOS.html 

24 http://irsa.ipac.caltech.edu/data/COSMOS/ 



the error on the background, even for sources close to 
problematic areas like the edge of the field or CCD gaps. 
We chose an area corresponding to f ps f— 90%, because 
this is the typical size of the area where emldetect works 
for relatively bright sources. We checked that the errors 
computed using Eq. 2 agree well with the errors evalu- 
ated from aperture photometry (Sect. 6.4). 

Fig. [15] plots the signal-to- noise ratio 25 of each source 
as a function of DETJV1L. Note the regular behavior of 
the signal-to-noise ratio, which increases smoothly and 
monotonically with increasing DET_ML, or with decreas- 
ing P random (see Eq. 1), with a small dispersion around 
the correlation. The six diagonal black lines show the 
expectations computed for six values of the background 
in the detection cell, from 0.5 counts to 8 counts. This 
range is centered on ~ 4 counts, a value typical for the 
C-COSMOS survey (see Sect. 2), and accounts for two 
effects: a) the differences in exposure time and b) the dif- 
ference sizes of the source extraction region as a function 
of the off-axis angle, due to the variation of the Chandra 
PSF with the off-axis angle. This range of background 
counts explains most of the observed dispersion in Fig. 
[TBI especially for the faintest sources. For the brightest 
sources in the F band, the best fit DETJV1L is some- 
what smaller than expected based on the signal-to-noise 
ratio, even for the case of a background of 8 counts per 
detection cell. This can be explained if the fit of bright 
sources is performed over an area significantly larger than 
the 90% ip S f area, that so does not fully optimize the 
signal-to-noise ratio. This shift is smaller for the S band 
sources because of the smaller background in this band 
with respect to the F band. 

6.3. Fluxes 

The emldetect count rates (R) were converted to fluxes 
(Fa,) using the formula: F a; =R/(CF-10 11 ), where CF is 
the energy conversion factor, that is evaluated by us- 
ing spectra simulated through Xspec 26 , including the ap- 
propriate on-axis response matrix and the chosen spec- 
tral models. We used energy conversion factors of 0.742 
counts erg -1 cm 2 , 1.837 counts erg -1 cm 2 , and 0.381 
counts erg -1 cm 2 appropriate for a power-law spectrum 
with energy index <xe = 0.4 and Galactic column density 
for the COSMOS field (N ff =2.7-10 20 cm~ 2 ), to convert 
the F count rate into the 0.5-10 keV flux, the S count 
rate into the 0.5-2 keV flux, and the H count rate into 
the 2-10 keV flux, respectively. We extend the F and H 
bands up to 10 keV, to allow an easier comparison with 
the results of literature. The conversion factors are sen- 
sitive to the spectral shape: for cue — 1 they change by 
~ 40% in the F band, by less than 5% in the S band and 
by less than 25% in the H band. For absorbed power-law 
spectra with Nh = 10 22 cm -2 and qe = 0.4 or a_e = 1.0, 
the conversion factors change by up to ~ 46% in the F 
band, by up to ~ 17% in the S band, and by up to 
~ 18% in the H band (see Tab. 3). The conversion fac- 
tor for the F band depends more strongly on the spectral 
shape because of the wider band. 

6.4. Aperture photometry 

25 Ratio between the source count rate and the error on the 
source count rate at 68% confidence level 

26 http://xspec.gsfc.nasa.gov/ 
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Fig. 14. — Left panel: The error on the source position as a function of the off-axis angle for sources detected in the F band. Right panel: 
the positional error in the F band as a function of the source count rate in 4 off-axis bins: filled circles = off-axis <2'; open squares = 
2' <off-axis< 4/; filled squares = 4' <off-axis<6'; filled triangles = off-axis >6'. The off-axis angle is the distance of the sources candidate 
position from the aim point of the pointing where the source position is measured with the best PSF (see Sect. 4.1). 
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Fig. 15. — The signal-to-noise ratio of each source as a function 
of DET_ML. Filled circles = F sources; open circles = S sources; 
open squares = H sources. The six diagonal black lines correspond 
to the expectations assuming a background of 8, 4, 2, 1, and 0.5 
counts in the detection cell, from top to bottom. 



In addition to PSF fitting photometry, we have also 
performed standard aperture photometry on the sources 
included in the final emldetect catalog. We find an overall 
consistency between the two estimates, with the emlde- 
tect count rates slightly larger, less than 10 %, than the 
count rates by aperture photometry. For each source in 
the catalog, aperture photometry was performed in F, S, 



TABLE 3 

Conversion factors for count rates to fluxes 



OlE 


N H 
10 22 cm" 2 


CF(F) a 
cts erg -1 cm 2 


CF(S) a 
cts erg -1 cm 2 


CF(H) a 
cts erg -1 cm 2 


0.4 


0.027 


0.742 


1.837 


0.381 


1 


0.027 


1.042 


1.759 


0.474 


0.1 


1 


0.508 


2.12 


0.361 


1 


1 


0.712 


2.151 


0.447 



a energy conversion factor to convert the F count rate into the 0.5-10 
keV flux (CF(F)), the S count rate into the 0.5-2 keV flux (CS(S)), 
and the H count rate into the 2-10 keV flux (CF(H)) using the formula 
Fa;— R/fCF-lO 11 ) and appropriate for a absorbed power-law spectra with 
the listed Nh and olb- 



and H band with the yaxx tool . The aperture photome- 
try values are derived from event data for each individual 
Chandra observation, where a source is located. Then for 
sources being located in multiple observations, the aper- 
ture photometry is performed in each of the multiple ob- 
servations, and the corresponding multiple aperture pho- 
tometry values are combined to produce a single set of 
values, using the appropriate method shown in Tab. 2J 

To extract source counts, circular regions of radii cor- 
responding to 50%, 90%, and 95% f ps /, centered on each 
source location, are used for each observation, where the 
source is located. The radii are calculated using the off- 
axis and azimuthal angles of the source in each observa- 
tion, and interpolating the circular i ps f table provided by 
the CXC calibration group to the nearest angles. Mean 
energies 2 keV, 1.2 keV, and 3.6 keV were chosen for 
the F, S, and H band, respectively. To extract back- 
ground counts, annuli with the inner edge at the 95% 
f ps f radius plus 8 pixels, and with a width of 40 pixels 

27 http:/ /cxc. harvard.edu/contrib/yaxx/ 
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emldetect count rates 



Fig. 16. — Ratio between the count rates evaluated by emldetect 
and count rates evaluated by the aperture photometry as a function 
of the emldetect count rates, for the F sources (blue rilled dots), S 
sources (green open dots), and H sources (red open squares). The 
solid black line is the exact match between the emldetect count 
rates and the aperture photometry count rates. 



are used. To limit contamination, all sources that over- 
lap with the source or background regions are masked by 
using circular exclusion regions with the 95% f ps f radius. 
Exclusions can also come from the CCD edge, with an 
8 pixel padding inward from the edge. Aperture fluxes 
for which the net source extraction area was less than 
75% of the available area (i.e., the original circle prior to 
exclusions) are not given in the catalog. 

Using the region described above, photometry was ex- 
tracted using the CIAO tool dmextract. The source net 
counts were then corrected for the fraction of f ps / . dmex- 
tract was also run on the exposure maps with exactly 
the same regions in order to compute the vignetting cor- 
rected exposure times, that are needed to compute the 
source count rates. 

Fig. [16] compares the count rates evaluated by emlde- 
tect with the count rates evaluated by the aperture pho- 
tometry. The median and interquartile of the count rate 
ratios are 1.03±0.16, 1.08±0.19, 1.07±0.18 in the S, H, 
and F band, respectively. 

6.5. Upper limits 

If a source is not detected in one band, we give the 
90% upper limits to the source count rates and fluxes 
in this band. The upper limits are computed following 
as follows: if T is the total number of counts measured 
at the position of a source not satisfying our detection 
threshold, B are the expected background counts and X 
are the unknown counts from the source, the 90% upper 
limit on X (X(90%)) can be defined as the number of 
counts X(90%) that gives 10% probability to observe T 
(or less) counts. Applying the Poisson probability distri- 
bution function, X(90%) is therefore obtained by itera- 
tively solving for different X values the following equa- 



tion: 

()A = er (x+B)j^(X + B)L (3) 

i=0 

(see e.g., Narsky 2000). We collected the counts T both 
from a region of 5 arcsec radius and from the aper- 
ture photometry discussed in Section 6.5. The results 
were always statistically consistent with each other. The 
X(90%) upper limits derived with Eq. 3 do not take 
into account the statistical fluctuations on the expected 
number of background counts. In order to take the back- 
ground fluctuations into consideration, we used the fol- 
lowing procedure: if er(B) is the root mean square of 

B (e.g., cr(B)=y A B for large B), we estimated the 90% 
lower limit on B as B(90%) =B - 1.282 ■ cr(B) 28 and, 
as a consequence, the "correct" 90% upper limit on X 
becomes: 

A corr (90%) - Y(90%) + 1.282 • a(B) (4) 

We used X corr (90%) as upper limits for C-COSMOS 
sources. We also evaluated the upper limits following 
the method described in Kashyap et al. (2009). Com- 
paring the upper limits obtained using the two methods, 
we found that our upper limits are generally more con- 
servative (i.e., higher) than those which would be derived 
using the method by Kashyap et al. (2009). 

7. SURVEY SENSITIVITY AND SKY-COVERAGE 

7.1. Survey sensitivity 

In X-ray observations the sensitivity, i.e., the flux limit, 
is not uniform in the field of view (FOV) , due to two main 
reasons: (1) the variable size of the PSF, that determines 
the background counts that limit the source detection; 
and (2) the vignetting of effective area. In C-COSMOS, 
where we have used multiple overlapping pointings giving 
different PSFs and vignetting factors for each observation 
of each source, the problem of assessing the sensitivity at 
each position in the field of view is more complex than 
normal. To evaluate the C-COSMOS survey sensitivity, 
we have developed a dedicated procedure by adapting 
the analytical method, used for the easier case of the 
ELAS-S1 mosaic (Puccetti et al. 2006 and references 
therein), to the more complicated C-COSMOS mosaic. 
In this procedure, the full C-COSMOS field is divided 
into a grid of positions with spacing of 4 pixels, i.e., 2 
arcsec. This bin size is a suitable balance between the 
spatial resolution in the C-COSMOS survey, and the ram 
memory required for computing the sensitivity maps. At 
each point of the 2 arcsec grid, we evaluated the mini- 
mum number of counts C m in needed to exceed the fluc- 
tuations of the background, assuming Poisson statistics 
with a significance level equal to that used for the catalog 
(i.e., 2 • 10~ 5 , see Sect. 6.1), according to the following 
formula: 

Pp 0l sson = Id = 2 ' W ^ (5) 

28 The value 1.282 is the value appropriate for the 90% probabil- 
ity (see e.g., Bevington P.R. and K. Robinson 1992). This approx- 
imate formula produces 90% limits which differ by ~ 10% (4%) 
from the exact estimate for values of B = 5 (10) in the extraction 
region, corresponding to 0.064 (0.128) cts/arcsec 2 (see Fig. 3). 
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TABLE 4 

Merge methods 



Parameter 



Symbol 



Merge method 



exposure time corrected for the vignetting Texpo 
counts T 
background counts B 
net counts C 3 

errors on counts err_T 

errors on background counts err_B 

errors on net counts err_C 3 

count rates R 

net count rates R s 

count rate errors err_R 

net count rate errors err_R 3 



E^ Texpo^ 



errjCs 

Ri-T cn;po 



E 4 err_R si 2 -T a: 



NOTE. — The index i indicates each of the observations where a source is located. 



where B is the total background counts computed at the 
position of each point (Pj) of the grid by B=^" =1 i?j, 
where i runs from 1 to the number of overlapping fields at 
the position of each P j and are the background counts 
computed using the background map of each Chandra 
pointing covering the position, in a region centered at 
Pj and of radius R^. Ri corresponds to a fixed value of 
fpsf, and is evaluated from the distance of Pj and the 
aim point of each single Chandra pointing covering the 
position, using the CXC calibration. We solved Eq. 5 
itcratively to calculate G m in- The count rate limit, Ku rn , 
at each point of the grid is then computed by: 

rilim - -, ~ (OJ 

Jpsf ' -L expo 

where T expo is the total, vignetting corrected, exposure 
time at each position of the grid, read from the merged 
C-COSMOS exposure map (see notes 20, 22). 

Finally, the flux limits at each Pj are computed using 
the same conversion factor used for the real C-COSMOS 
sources. This procedure is applied to the S, H, and F 
bands to produce binned sensitivity maps. 

7.2. Sky-coverage 

The "sky-coverage" is the integral of the survey area 
covered down to a given flux limit, as a function of the 
flux in the sensitivity map. The solid lines in Fig. [T2lare 
the normalized sky-coverages, computed using the pro- 
cedure described above and adopting f ps f = 0.5. We 
studied how the sensitivity maps and the sky-coverage 
depend on the assumption on f ps f and found that they 
change less than few per cent for f ps f values up to 0.90. 
We also studied how the sensitivity maps change using 
different f ps f values at different off-axis angles where a 
single source is observed, finding again very little change. 
The reason for this behaviour is the relatively low back- 
gound within each R^, even at large off-axis angles. 

A relatively large uncertainty in the sensitivity maps 
and sky coverage computation is instead, the unknown 
spectrum of the sources near the detection limit. The 
magnitude of this uncertainty depends on the width 
of the energy band, and therefore is largest in the F 



band. To estimate the magnitude of the uncertainty, 
we calculated the sky coverage for power-law spectra 
with aE = 1-0, and for absorbed power-law spectra with 
otE = 0.4 or aE — 1-0 and N# = 10 22 cm' 2 , in addition 
to the baseline case (a# = 0.4, Njy = 2.7- 10 20 cm -2 ; see 
Fig. [T7|) . At the flux limits corresponding to 90% com- 
pleteness (see Tab. 2) the deviations are less than 3%, 
~ 3%, and ~ 16% for the S, H, and F bands, respectively. 
This uncertainty related to the unknown spectrum of the 
sources becomes significant only at fluxes below the 50% 
completeness. 

7.3. The log A - logS* 

We used the catalogs of the sources detected in the sim- 
ulations in the S, H, and F bands, and the sky-coverage 
curves computed in Sect. 7.2 to obtain the number 
counts (log A - log S) of the sources detected in the sim- 
ulations. We cut the catalogs in the S, H, and F band at 
a signal-to-noise ratio higher than 2, 2.5, and 2.8, respec- 
tively. These cuts are introduced because: (1) we do not 
correct for Eddington bias, which may be strong (up to 
30-50%) at the lowest flux limits; (2) low signal-to-noise 
implies a large statistical uncertainty in the flux, which 
in turn would introduce a large uncertainty on the num- 
ber counts at the lowest fluxes; (3) at the lowest fluxes, 
the sky-coverage is small, and the relative statistical and 
systematic errors are therefore large, again introducing 
large uncertainties in the number counts. We chose the 
signal-to-noise thresholds by requiring that the devia- 
tions between the log A - log S computed from the de- 
tections and the input log A - log S are smaller than 5%. 
The log A- log S are shown in Fig. [THJ The flux limits im- 
plied by the signal-to-noise thresholds are ~ 2.3 • 10 -16 , 
- 1.6 • 10" 15 , and - 9.6 • 10" 16 erg s" 1 cm~ 2 for the 
0.5-2 keV, 2-10 keV, and 0.5-10 keV band, respectively. 
These flux limits are fully consistent with the flux limits 
of the log A - log S derived from the observed data (see 
Paper I). 

8. COMPARISON WITH AEGIS-X 

Chandra was used to perform a survey somewhat 
similar to C-COSMOS in the Extended Groth-Streep 
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Fig. 17. — The sky-coverage calculated as in Sect. 7.2 for the 
0.5-10 keV (top panel), 0.5-2 keV (middle panel), and 2-10 keV (bot- 
tom panel) band. The black solid lines represent the sky-coverages 
evaluated with the baseline model (i.e., power- law spectra with 
a E = 0.4 absorbed by Galactic N H = 2.7 ■ 10 20 cm -2 ). The cyan 
long- dashed lines represent the sky-coverages for power-law spectra 
with a E = 1 absorbed by Galactic N H = 2.7 ■ 10 20 cm -2 . The 
blue short-dashed lines represent the sky-coverages for power-law 
spectra with aj = 0.4 absorbed by N^f = 10 cm -2 . The red 
dotted lines represent the sky-coverages for power-law spectra with 
a E = 1 absorbed by N// = 10 22 cm -2 . The black dot-long dashed 
vertical lines represent the fluxes correspondig to the 90% and 50% 
completeness, respectively. 

(AEGIS-X, Laird et al. 2009). The 1.6 Ms AEGIS-X 
survey is made of 8 ACIS-I pointings, each of a nomi- 
nal 200 ksec exposure, with very little overlap, covering 
~ 0.67 deg 2 . While the effective exposure time and area 
coverage are similar to C-COSMOS (see Fig. [T9|) . the 
tiling is completely different. In C-COSMOS each source 
in the central area is observed at four to six different off- 
axis angles, while in AEGIS-X each source is observed 
only at one off-axis angle. 

To compare the two surveys quantitatively, we cut the 
C-COSMOS catalog at the same significance level used 
for AEGIS-X (i.e., 4 • 10~ 6 or DET_ML=12.4, Laird et 
al. 2009). We also recomputed the C-COSMOS sky- 
coverage using the same significance level. Fig. [19] 
compares the C-COSMOS sky-coverage to the AEGIS-X 
one computed without the Bayesian correction for the 
Eddington bias. The C-COSMOS sky-coverage has a 
significantly sharper drop toward lower fluxes than the 
AEGIS-X sky-coverage. This means that the sensitivity 
in C-COSMOS is more uniform over the field than in 
AEGIS-X, while the AEGIS-X tiling reaches fainter lim- 
iting fluxes than C-COSMOS. The estimated AEGIS-X 
flux limit in the S band is 50% deeper than C-COSMOS, 
while the flux limits in the H and F bands are about twice 
as deep as C-COSMOS, albeit in small areas. The deeper 
AEGIS-X flux limit in the H and F bands with respect to 
the S band depends on the higher internal background in 
these bands and on the smaller typical source extraction 



regions in the areas of best sensitivity of AEGIS-X with 
respect to C-COSMOS. In fact, AEGIS-X has a PSF bet- 
ter than ~ 1 arcsec over an area of ~ 0.15 deg 2 , while 
the complex C-COSMOS tiling implies effective source 
extraction regions of radii of ~ 3 arcsec over most of the 
area. 

The more characteristic flux limits corresponding to 
90% completeness in the F, S, and H bands are similar 
in C-COSMOS and AEGIS-X, while the AEGIS-X flux 
limits corresponding to 50% completeness in the F, S, 
and H bands are lower than C-COSMOS by a factor 2-3 
(see Tab. ©. 

The more uniform sensitivity of C-COSMOS over the 
field reaches a higher source density (see Tab. [5]). 

C-COSMOS we estimate a slightly lower number of 
spurious sources at a higher significance level (i.e., 2-10 -5 
vs. 4 • 10~ 6 , see Tab. [5]), with respect to AEGIS-X sur- 
vey. The number of spurious sources is roughly given by 
the product of the significance level times the number of 
independent detection cells in the field. The combination 
of different PSFs at each C-COSMOS position produces 
an effective source extraction region of ~ 3 arcsec radius, 
i.e., significantly wider than the Chandra PSF at off-axis 
angles smaller than 5-6 arcmin. This means that the 
number of independent cells per unit area is smaller in 
C-COSMOS than in AEGIS-X. In conclusion, the lower 
number of spurious detections in C-COSMOS with re- 
spect to AEGIS-X at a given significance level is due to 
the fact that each field is observed more than once at dif- 
ferent off-axis angles and therefore with different PSFs. 

9. CONCLUSION 

The complex tiling of C-COSMOS survey required the 
development of a tailored multistep procedure to fully 
exploit the data. Detailed simulations were used to test 
different detection (sliding cell and wavelet) and photom- 
etry (PSF fitting and aperture photometry) algorithms. 
In particular, we compared the results obtained using 
the SAS eboxdetect and emldetect tasks, used for the 
XMM-COSMOS survey (Cappelluti et al. 2007, 2009), 
with those obtained using the PWDetect code (Damiani 
et al. 1997). Through these tests we selected a pro- 
cedure consisting in first identifying source candidates 
using PWDetect, and then performing accurate PSF fit- 
ting photometry and evaluating aperture photometry for 
each source candidate. In this way we obtained subarc- 
sec source localizations and accurate photometry even 
for partly blended sources. 

We set a threshold for source detection to P = 2 • 10~ 5 , 
which implies a completeness of 87.5% and 68% for 
sources with at least 12 and 7 F band counts, respec- 
tively, and 3 to 5 spurious detections in the F band at 
the same count limits, respectively. 

We evaluated the survey sensitivity and the sky- 
coverage, through an analytical method, tuned using 
simulations. We then evaluated the logiV - log S of the 
detected sources in the simulations down to F, S, and H 
band flux limits of F x ~ 2.3 • 10" 16 , - 1.6 • 10" 15 , and 
~ 9.6 • 10 -16 erg s" 1 cm~ 2 , respectively. 

Finally we compared the C-COSMOS survey to the 
AEGIS-X survey, a Chandra survey with similar sky- 
coverage and total exposure time, but using non overlap- 
ping ACIS-I pointings. We found that the complex tiling 
of C-COSMOS helps in obtaining a contiguous area with 
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Fig. 18. — log-/V - log 5 curves of the simulated sources (dashed red curves) compared with the log A" - log S curves of the sources 
detected in the simulations (black dots): top left panel: S band, top middle panel: 2-10 keV band, top right panel: 0.5-10 keV band. Ratio 
between log N - log 5 curves of the simulated sources and the log N - log 5 curves of the sources detected in the simulations: bottom left 
panel: 0.5-2 keV band, bottom middle panel: 2-10 keV band, bottom right panel: 0.5-10 keV band. 



TABLE 5 

Comparison between C-COSMOS and AEGIS-X 



Parameter 


units 


C-COSMOS F 


AEGIS-X F 


C-COSMOS S 


AEGIS-X S 


C-COSMOS H 


AEGIS-X H 


90% Completeness a 
50% Completeness 51 


erg cm -2 s" 1 
erg cm -2 s — 1 


4.3 ■ 10~ 15 
1.8 ■ 10" 15 


4.0 • 10~ 15 
6 ■ 10" 16 


1.1 ■ 10" 15 
5.1 ■ 10" 16 


1.1 ■ 10" 15 
1.4 ■ 10" 16 


8.0- 10" 15 
1.8 ■ 10" 15 


6.2 • 10" 15 
9 ■ 10" 16 


observed source densities' 3 


sources-deg - 2 


2110±68 


1830±52 


1700±61 


1550±40 


1320±54 


1110±41 


number of spurious sources 


sources 


5 


5 


4 


5 


3 


5 


number of sources' 1 


sources 


1655 


1221 


1340 


1032 


1017 


741 



a Completenesses arc evaluated using a significance level of 4 ■ 10 — . 

The observed source densities are evaluated in the total AEGIS-X area, and in the central ~ 0.45 deg 2 area in the C-COSMOS, which has similar 
effective exposure of the AEGIS-X survey, using a significance level of 4 ■ 10 — 6 . 

c In C-COSMOS the spurious sources are evaluated using a significance level of 2 ■ 10 5 in the total field, for each band. For AEGIS-X Laird et al. 
(2009), using simulations, found 0.58 spurious sources per 200 ksec field per band using a significance level of 4 ■ 10 6 , corresponding to 5 spurious 
sources in the full AEGIS-X survey, each band. 

Number of sources detected in each band in the total fields, using a significance level of 2 ■ 10~ J and 4 ■ 10 — 6 for C-COSMOS and AEGIS-X, 
respectively. 



uniform sensitivity and somewhat higher source density. 
The overlap of several pointings with different PSF at 
the same position produces an effective source extrac- 
tion region of ~ 3 arcsec radius, i.e., significantly wider 
than the Chandra PSF at off-axis angles smaller than 5-6 
arcmin. This produces a number of independent detec- 
tion cells per unit area smaller than in a single ACIS-I 
pointing survey like AEGIS-X, which in turn implies a 
smaller number of spurious sources at each given detec- 
tion threshold. 
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